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cT^piTrTTTRAL m^ ^XixmnN FINGFRPRTNT (SIFt) 

PRIORITY CLAIM 

lOOOl] This application claims priority under 35 U.S.C. § 119(e) to U.S. Application 
No. 60/524,083, filed November 24, 2003, and U.S. Application No. 60/484,308, filed 
July 3, 2003, each of which is incorporated by reference in its entirety. 

BACKGROUND 

10002] Representing and understanding the three-dimensional structural information 
of biological molecules is becoming a critical step in the rational drug discovery process. 
With the advent of massive virtual chemical library screening, as well as the recent 
advancements in X-ray crystallography, NMR and homology modeling techniques, the 
amount of structural information increases at an explosive speed, m traditional analysis 
methods are inadequate and inefficient in dealing with such massive structural 
information. 

I0003J The past decade has seen an explosion of the rt«-ee-dim=nsional structural 
infbm^tion of biologically itnportant molecules, thanks to the recent development of X- 
ray crystallopaphy. NMR a™i molecular modeling techniques. There are currently about 
20 000 holdings deposited in the Protein Data Baric, and a significant portion of these 
stri^ctures contain ligands bound to macromolecules. ta addition, combmatonal 
chemistry and virtual library screening are becoming routine procedures ,n the drug 
discovery process. This process generates thousands to millions of virtual protein-hgand 
complex struct^es, maMng detailed examination of these structores a dauntmg task 
Representing the three-dimensional structural infonnationofmacromolecules has always 

been a challenge due to the complexity of identifying residues and atomic interaCtons. 
RepresenHng th'e covalen. or non-covalen. interactions between molecules poses ev«, 
more diliicuh challenges, because no. only is the geometric location of each interaction 
needed, but also the direction, type, and magnititde of the interaction are also important 
■ and need to be captured. Understandmg the intermolecular interactions between protcns 
and their ligands is of great importance as it provides insights into the functional 
mechanism of the proteins. It is important for strucmre-based drug design to understand 
the key forces between small molecules (SMs) and proteins and to be able to compare 
different orientations or different small molecules binding to the same receptor site, or 
different binding sites. 
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1 0004 ] Traditionally, understanding and comparing the interactions between protems 
andligandsisachievedbyvisuallyinspectingindividualstructurewithstructure- 

renderingsoftwareonagraphic terminal, sometimes facilitatedbyoth.^ 
that generate 2-D or 3-D schematic representations of the interactions (e.g., LIGPLOTt y 
such time-consuming processes require human intervention and it becomes more and 
.ore tedious as the number of complex structures increases. It is important for suc^essft.1 
drug discovery to have a tool that allows this massive amount of structural information to 
be organized and analyzed. 

tOOOSl Morereoe„.ly.sm.c»re-basedvim«Uhemicallibrarysc«e„inghasbecomea 
_ p^ced^e in the d^g discovery process. Vir»a, library screening .y^o^y 
generateshundreds of *ousands of vir«a, pro.ein-iigand con,p.=x s— . Eff^Uvely 
lining *is massive sm.en.rai library becomes a tremendous .ask. as i. is imposs,ble .0 
analyze .he sm.cW,es individually Tradittonally, differen. .ypcs of empirical dockmg 
scores and some pharmacophoric fillers are used .o sif. .he docking resuUs for .,gh. 
binders wiU, desired binding inr^actions. However. *ese me,hods have limi.a..ons. 
correlation be«veen good docking scores and high activity is no. ^^^^ 

docking scores are an overall summation of interaction and do no. d.scem d.fferences 
in binding modes. Therefore, a method that allows accurate represenution of U.e 
interaction and fas. analysis of a large number of strucmres is in great demand. 

SUMMARY 

,0006, ,noneaspeo..ame*odisprovidedforgenera.ingas«»cn.ralin.erac«on 
flngerprin, (SIF.). The SIF. is in *e form of an information stting which .ncludes a 
plurality of information blocks, and each information block includes a plu^hty of 
Lformation units. The meti,od includes *e steps of selecting a plurality of portions 
5 (selected positions) on a targe, molecule where each selected position corresponds U, an 

nformation block in ti,e infom,a.ion string; selecting a plurality of inleraction types an 
calculating a value that is indtcative of the characteristic of each interaction type a. each 

selected position of ti.e .arge. molecule; assi^ing 0,e value .o .he correspondtng 

information uni. a.ereby indicating .he charac.eris.ic of *at particular i,..e«etion 
>0 tt,e corresponding selected posiiion; and Joining .he infom,ation uni« of each sel«.ed 

posi.ion .ogemer .o form *e corresponding infonnaiion blocks, which p.ns .ogether to 

generate a SIFt. 
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[0007 1 The target molecule can be a protein or a fragment thereof, such as a peptide 
(eg polypeptide or oligopeptide). Alternatively, a target molecule can be a nucleic aad. 
In certain circumstances, the ligand can be a peptide, a nucleic acid, or even a small 
molecule (e.g., an organic molecule (e.g., molecular weight equal to or less than 1,500 
dalton) that is neither a peptide or a nucleic acid). 

[0008] Note that the target molecule is forming a complex with a ligand (i.e.. the 
binary complex), and the selected positions are the positions on the target molecule that 
participate in intermolecular interaction with the Ugand. These positions can be obtamed 
from a three-dimensional structure of a binary complex formed between the target 
molecule and the ligand. The three-dimensional structure can be derived from an 
experimental method or a prediction method such as, for example, an in silica prediction 
method In one embodiment, a set of selected positions can be obtained from companng 
the common positions (e.g., residues or bases) of the target molecule that participate m 
intermolecular interactions among a set of target molecule-ligand structures. The target 
molecule can be the same or different in the set of target molecule-ligand structures. 
[00091 For a protein or peptide target molecule, each selected position can include 
one or more secondary structure elements (e.g., an a-helix or a P-strand), amino acid 
residues (e.g., a lysine residue), main chain atom groups (the a-carbon of a particular 
amino acid residue), side chain atom groups (e.g., the butylamine group of a Lys), or 
individual atoms of the target molecule. As to a nucleic acid target molecule, each 
selected position can include one or more bases, fimctional groups, or individual atoms of 
the target molecule. 

[0010] The value that is assigned to a particular information unit canbe abinary 
value or a numeric value selected from a scale or range of numbers. The binary value 
indicates whether a particular interaction type is present (1) or absent (0) at the 
corresponding selected position of the target molecule, whereas the numeric value 
indicates the magnitude of a particular interaction type at the corresponding selected 
position of the target molecule (e.g., a value of "3" in a scale that ranges from "0" to «5 ). 
[0011] As mentioned above, the value indicates the characteristic of a particular 
3 interaction type at that selected position. Note that the interaction types represent 

different types of intermolecular interactions between the target molecule and the ligand. 
For example, the interaction type can be classified as contact interaction. One can detect 
the presence of contact interaction between a target molecule and a ligand at a selected 
position (e.g., a protein residue) according to a number of methods. In one embodiment. 
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the target molecule-ligand pair is considered to have estabhshed contact interaction at a 
selected position if the interaction involves a change or reduction in the accessible surface 
area at that position of the target molecule upon forming a complex with the ligand. 
Alternatively, one can measure the intermolecular distance between a target molecule and 
a ligand at a selected position to determine whether contact interaction occurs at that 
position (i.e., whether the intermolecular distance is within the predetermined distance 
cutoff limit). In one embodiment, the target molecule-ligand pair is considered to be 
interacting if the interatomic contact distance between the target molecule and the hgand 
is equal to or less than 10 A (e.g., equal to or less than 6 A, or even 4 A). Th. interaction 
type can be further classified as polar interaction, non-polar interaction, and/or hydrogen 
bonding interaction, depending on the nature of the interactions. In one embodiment, the 
hydrogen bonding interaction can involve a hydrogen bond donor in the target molecule 
and a hydrogen bond acceptor in the ligand at the selected position. In one embodiment, 
the hydrogen bonding interaction can involve a hydrogen bond acceptor in the target 
molecule and a hydrogen bond donor in the ligand at the selected position. Note that 
intermolecular interactions can be characterized by interaction energy-based approach. 
The interaction type can be characterized by the contribution of the selected position to 
the interaction energy between a target molecule and a ligand where the total interaction 
energy between the target and the ligand is a summed over all positions. The interaction 
energy may be computed by a variety of scoring functions or intermolecular force-fields 
such as common Ugand-receptor docking scoring functions (e.g., Dock, Gold, 
ChemScore, FlexX score, PMF, Screencore, Drugscore, etc.) or intermolecular potential 
energy fiinctions or force-fields (e.g., CHARMM, Amber, OPLS, etc.). The interaction 
energy calculated for each information unit (which corresponds to a selected position) 
may take the fonn of a real number (i.e., -43.2 kcal/mol), integer (i.e., -43 kcal/mol), or 
an integer representing a binned form of the interaction energy. In the latter case, the 
energy range of the fimction is divided into bins (e.g., [-70 to -50 kcal/mol], [-50 to -20 
kcaVmol], [-20 to 0 kcal/mol], or [0-10 kcal/mol]) where the interaction energy is 
represented as an integer identifying the bin (in this case for example 1 , 2, 3, or 4). 
30 [0012] hi one aspect, a method of predicting the interaction pattern between a target 
molecule and a test ligand is provided. A test ligand is a ligand whose affinity to the 
target molecule is under examination. The prediction method involves identifying a 
plurality of selected positions between the target molecule and a first ligand, wherein the 
first ligand is known to bind to the target molecule (i.e., the affinity between the first 
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ligand and the target molecule is known). As described above, selected positions are 
positions on the target molecule that participate in intermolecular interactions with the 
ligand (here, the first ligand). Based on the selected positions, the method then involves 
generating a first structural interaction fingerprint (SIFt) as described above (i.e., 
formation of an information string that includes a plurality of information blocks, where 
each information block includes a plurality of information units, and where each 
information unit is assigned a calculated value indicative of the presence/absence or the 
magnitude of a particular interaction type at the selected position of the target molecule to 
which the information unit/block corresponds). Using the same selected positions, the 
method then involves the generation of a second SIFt between the same target molecule 
and a second ligand (i.e., a test ligand) employing the same steps as described above. 
Finally, the method involves comparing the first SIFt with the second SIFt to determine 
the level of overiapping between the first and second SIFts. A pattern of substantial 
overiapping between the two SIFts predicts that the second ligand interacts with the target 
molecule in a similar pattern as the first ligand. In one embodiment, the first ligand is the 
natural ligand of the target molecule. In one embodiment, the first ligand is a ligand of 
known affinity to the target molecule. 

[0013] In one aspect, a method of generating a sti^ctural interaction fingerprint (SIFt) 
database is provided! The method involves (1) identifying a plurality of selected 
positions on a target molecule (which forms a complex with a first ligand) and (2) 
generating a first SIFt of the database as described above (i.e., formation of an 
information string that includes a plurality of information blocks where each information 
block includes a plurality of information units, and where each information unit is 
assigned a calculated value indicative of the presence/absence or the magnitude of a 
particular interaction type at the selected position of the target molecule to which the 
information unit/block corresponds). The method then requires that steps (1) and (2) be 
repeated using the same target molecule but a different ligand such that another SIFt can 
be generated and added to the databases. The method then repeats steps (1) and (2) with 
different ligands and generates more SIFts until the database contains a desired number of 
SIFts. In one embodiment, the method fijrther involves analyzing the SIFts of the 
database to generate one or more interaction patterns between the target molecule and the 
ligands. Typically, ligands that belong to a particular interaction pattern indicate that they 
bind to the target molecule in a similar manner. In one embodiment, tiie method fiuther 
involves comparing one (or more) interaction pattern of the database with a SIFt 
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generated by using the same target molecule and a test ligand. A test ligand is a ligand 
that was not employed in generating the database. From the degree of similarity between 
the SIFt generated using the test ligand and the interaction pattern, one can predict 
whether or not the test ligand binds to the target molecule in a similar manner. One can 
5 even predict whether or not the test ligand belongs to the same family of ligands used to 
generate the database. In one embodiment, the method further includes the step of storing 
the database in a computer readable medium. 

[ 0014 ] In one aspect, a method of analyzing the interaction pattern of two or more 
related target molecules is provided. The method includes conducting sequence and 
1 0 structural alignments among each of the related target molecules resulting to derive a 
uniform residue or base numbering system. The method then involves identifying a 
plurality of selected positions on the target molecule of each target molecule-ligand 
complex using the uniform residue or base numbering system. This is followed by 
generating a SIFt for each target molecule-ligand complex as described above and 
1 5 comparing different SIFt patterns. The interactions can be conserved or unconserved. 
[0015] The method can include compiling the SIFts to identify selected interactions 
that are conserved among the complexes. The method can include calculating a score for 
each interaction among the target molecule-ligand complexes. The score can include a 
conservation score. The method can include compiling the SIFts to form an interaction 
20 profile firom the calculated conservation score, or comparing a SIFt generated firom a test 
ligand with an interaction profile generated fi-om a group of target molecule-ligand 
complexes, thereby predicting whether the test ligand interacts with the target molecule in 
a similar pattern with the group. The method can include comparing two interaction 
profiles, thereby predicting whether two groups of structures share conserved binding 
25 interactions, and/or have similar binding pattern. 

[0016] As used herein, the target molecules are related if they exhibit at least 20% 
sequence similarity or a structural similarity with a root-mean squared deviation over the 
aligned positions no greater than 4 A (e.g., 6 A). In yet another embodiment, the target 
molecules are related if they exhibit at least 20% protein sequence similarity with a root- 
30 mean squared deviation over the aligned positions no greater than 6 A. For protein target 
molecules, sequence and structural alignments are commonly applied within the structural 
biology field. There are databases including the PFAM database that includes protein 
sequence alignments (http://www.sanger.ac.uk/software/Pfam/index.shtml) and the SCOP 
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database (http://scop.mrc-lmb.cam.ac.uk/scop/) that contains protein structural 
alignments. 

100171 In some embodiments, at least one interaction type includes a chemical or 
physical property of a part of ligand interacting with each selected position. In other 
embodiments, each interaction type includes a chemical and physical property of a part of 
ligand interacting with each selected position. The interaction types can mclude 
information bits about the chemical composition of a ligand (e.g.. various R groups m a 
combinatorial library), or an experimentally determined or computed property of the part 
of the ligand interacting with the selected position. For example, interaction types can 
include infomiation bits representing varying groups of a combinatorial library. 
Properties and descriptors of a molecule or part of a molecule can include fragment 
constant descriptors (e.g., hydrophobic, hydrogen bond acceptor, hydrogen bond donor, 
hydrophobic aliphatic, hydrophobic aromatic, negative charge, negative lonizible, 
positive charge, positive ionizible, or aromatic ring), electronic descriptors (e.g., charge, 
partial positive surface area, partial negative surface area, dipole moment, atomic 
polarizability, polar surface area), topological descriptors (e.g., Wiener index, Zagreb 
index. Hosoya index), molecular flexibility index, spatial descriptors (e.g., shadow 
indices, molecular surface area, density, principal moment of inertia, molecular volume), 
structural descriptors (e.g.. number of chiral centers, molecular weight, number of 
rotatable bonds), or thermodynamic descriptors (e.g.. partition coefficient, desolvation 
free energies for water and octanol. pKa). Tlie interaction type can also include a 
chemical fingerprint for a part of the ligand interacting with the selected position of the 
target molecule. A chemical fingerprint is a string of values (usually an array of binary 
bits) that contains the unique information about the chemical makeup (e.g.. atoms, 
substructures, chirality) of the molecule. In some embodiments, the interaction types can 
also include information about the selected position in the target molecule, such as 
variables measuring the sequence conservation, structural conservation and flexibility of 
the selected position of the target molecule. 

[0018] In a fiirther aspect, a computer-readable data storage medium is provided. The 
) medium includes a data storage material encoded with a computer-readable database. 

The database includes a plurality of SIFts generated from a target molecule and a plurality 
of ligands. Each SIFt is in the form of an infomiation string that includes a plurality of 
information blocks, and each information block includes a plurality of information units. 
Tl,e target molecule interacts with each ligand at a plurality of selected positions on the 
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target molecule via a number of interaction types. As described above, selected positions 
are positions on the target molecule that participate in intermolecular interaction with the 
ligand. The magnitude of each interaction type at each selected position is calculated and 
represented by a value, which is assigned to a corresponding information unit. The target 
molecule a be a protein, a peptide, or a nucleic acid, and the hgand can be a small 
molecule, a peptide, a protein or a nucleic acid. In one embodiment, the value that is 
assigned to an information unit is a binary value, which indicates the presence or absence 
of a particular interaction type at the corresponding selected position. In one 
embodiment, the value that is assigned to an information unit is selected from a range of 
scaled numeric values, which indicates the magnitude of a particular interaction type at 
the corresponding selected position. For a protein/peptide target molecule, each selected 
position can include one or more amino acid residues, main chain atom groups, side chain 
atom groups, or individual atoms of the target molecule. For a nucleic acid target 
molecule, each selected position can include one or more bases, functional groups, or 
individual atoms of the target molecule. In one embodiment, the interaction type can be a 
contact interaction. For example, the interatomic contact distance between the target 
molecule and the ligand can be equal or less than 10 A (e.g., equal or less than 6 A, or 
even 4 A) for the target molecule-ligand pair to be considered as having contact 
interaction. As another example, the contact interaction can include a change in the 
accessible surface area of the target molecule upon forming a complex with the ligand. In 
one embodiment, the interaction type can be a polar interaction, non-polar interaction, 
and hydrogen bond interaction. In one embodiment, the hydrogen bond interaction can 
include a hydrogen bond donor in the target molecule and a hydrogen bond acceptor in 
the ligand at the corresponding selected position. In one embodiment, the hydrogen bond 
interaction can include a hydrogen bond acceptor in the target molecule and a hydrogen 
bond donor in the ligand at the corresponding selected position. 

[0019] In yet a further aspect, a computer program for generating a SIFt that is in the 
form of an information string comprising a plurality of information blocks, where each 
information block includes a plurality of information units is provided. The computer 
program contains instructions for causing a computer system to select a plurality of 
positions (selected positions) on a target molecule (which is forming a complex with a 
ligand). The selected positions are positions on the target molecule that participate in 
intermolecular interaction with the ligand. Each selected position corresponds to an 
information block in the information string. The computer program can perform one or 
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„„reoffl«fono™gs.eps:se.ec.apl«rali,yofin,erac.io„.ypesU,atexis.bem.^^^ 
U.,^ ™>ecu.e and U,. ligand; oaloulate a value tt.a. is indicative of me eha,ac.ens.,c of 
eal in— .ype a. eaeh se.ec.ed position of *e urge, molecule; assign d-e value ,o 
,he corresponding infom,a«on uni. so as .0 indicate *e eharac.eris«c of *a. par..cular 
in.erac.io. .ype a. *e corresponding selec«d posi.ion; Join fte informa«on un,.s o each 
seleced posiiion .ogeflrer .0 form to corresponding infbm,a.ion blocks; and jom tt,. 
info,ma.ion blocks .0 generate a Sm. The t^e. molecule can be a protein, a p<^f de or 
a nucleic acid, and the ligand can be a small molecule, a peptide, or a nuclcc acd^ In 
en,bodiment, the value .ha. is assi^ed to an informaaon uni. is abinary value. wh,ch 
indica.es the presence or absence of a particular interaction type at the correspondmg 
selected position. M one embodimen., *e value fta. is assigned to an information umt ,s 
selected from a range of scaled numeric values, which indicates the magniti.de of a 
particular interaction type a. .he corresponding seleced posi.ion. In one embod,men.. 
L seleced positions are ob.ained from a .hree-dimensional stiucn^e of a bmary complex 
formed be».een tt.e target molecule and the ligand. Such a .hree-dimensional s«.cti^ 
„aybe derived from an experimental mett.od or a prediction meU.od such as, for 
example, an ,„ sUico prediction meU.od. For a pro.ei„/pep.ide targe, molecule, each 
seleced position can include one or more amino acid residues, main chain a.om groups, 
side chain a.om groups, or individual aloms of ttte targC molecule. For a nucle,c acd 
targe, molecule, each seleced position can include one or more bases, fimcticmal g^ups^ 
or individual .U>ms of U,e Urge, molecule. interadon types represent dtfTerent types 
of intermolecular interaCions bcween fte targC molecule and tt.e liga«l and can be 
characerized by binding energy-based approach, one embodimen., *e interaction 
can be a contaC in.eraction. For example, tire ime^omic conUC distimce between 
; Z targe molecule and the Hg^d can be equal or less ti«n 10 A (e.g.. e,ual or less than 6 
A or even 4 A) for the targC molecule-ligand pair to be considered as having contaC 
inie^ction. As another example. ti,e con,aC in.eraction can include a change .n the 
accessible surface area of U.e .arge. molecule upon forming a complex «■* *e hgand. In 
one embodimen.. the interaction type can be a polar interaction, non-polar .nteraction, 
0 and hydrogen bond interaction. In one embod,men,. ti,e hydrogen bond interaction can 
include a hydrogen bond donor in the targC molecule and a hydrogen '■o™' « " 
me ligand a. ti,e corresponding seleced posi.ion. In one embodimen, ,he hydrogen bond 
inreracion can include a hydrogen bond ac^.or in .he .arge. molecule and a hydrogen 
bond donor in the ligand at U.e corresponding seleced position. In one embod.men, *e 
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„«hod can ftartoer include insm^on. .0 store tt>e SIF. in a database. In one 
e^bodi^en.. ^ con.pu.er pn^gran, can inOnde ins— for genera«ng a pinra.^ of 
S>F.s by ,he repea«ng *e steps recited above using, e.g.. the same targe. moWcule and 
selected positions, but different ligands. Tlte plurality of S.FU may then be stored .n 
aa^base. ,n one embodimen, the computer pn>gran, can iur^ter include - 
genera,e a SIF. using .he same target molecule and a .est liga^l. and to cornpare thrs S^t 
wi* anod,er SIF. (e.g.. generated using the same targe, and a known .,ga«.) or ano*« 
group of SIFU (i.e.. either one SIFt or a plurality of Sff« forming an 
various methods can be used ,o compare the genera.ed SIF. »ith one or more o4er SlFt^. 
For example, a comparison can be performed using a simple s«. of matchmgb.ts (umts) 
across dte entire SIFT, or by the application of one or more similarity measures 
(including, e.g.. Tanimoto coefficien, Euclidean d,s.ance. cosine correlatton coefflcen, 
correlation, half square Euclidean dislance, and city block dis.ance). Furthermore. 
libraryofSIF.scanbecomparedby,forexample.firs.carryingoutanpa,rw,se 

comparisons using one of flte similari.y measures men.ioned above and .hen applyng 
hierarchical cluslering to group SIFts according to the similarity. The c— - 
for example, one or more common clusler similarity methods (includmg, e.g.. UPGMA 
(Unwei^ted Pair-Group Method «i.h Aridunetic me».). WPGMA (Wei^^P^ur- 
Group Mefcod wi* Aritanetie mean), single linkage. eomple.e liricage. and Wards 
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100201 Asusedherein.atarge.moleculegenerallyrefersabiomoleculewhose 
funcions are desired .„ be modulated. Atarge. molecde conUins a region (i.e.. bmdmg 
si.e) .ha. allows i. .o bind .o one or more ligands *at satis^ the binding ctrtena. A target 
Jecule can be a macmolecule such as a protein (or even a polypeptide) or a „ud«c^ 
acid A .arge. molecule is .ypically a bio-macromo.ecule whose functions can be altered 
when it is bound to a molecule (i.e.. ligand) .ha, fl.s its binding or active s.te^ 

AS used herein, a ligand refers to a molecule that binds to tire btndmg or active 
Site o, a targe, molecule. A ligand is .ypically a smaller molecule '"^7" 
and Wically hinds to a target molecule with high affinity (e.g.. with a K, of a. leas. 1 
„M). AhgandcanbeanaturalhgandorsubstiateOcnamrallyoccumngma 
biological sys.em) .0 fte targe, molecule, e.g. ATP to certain kinases such as p38. A 
ligand can also be a small molecule inhibilor. e.g. SB203580 titat is a well-known 



inhibitor of p38. 
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[00221 AS used herein, a naturally occurring amino acid is defined as one of fte 
wentyaminoacidsnawrauyoccurringinprceins. These namrally occurring am,no 
acids are fl.e L-iscmers of glycine, alanine, valine, leucine, isoleucine. serine, metaomne, 
tt^nine. phenylalanine, tyrosine, tryptophan, cysteine, prohne. histidine. asparirc acd, 
asparagine. glutamic acid, glutanrincarginine. and lysine. A so-called '—a, am no 
acids is any amino acid other than the twenty named above. Included are D-,somers of 
the twenty amino acids named above. D or L isomers or racemic mrxmres of 
selenocysteine and selenomethionine, and the D or L forms (or racemic mixn.es) of. e.g.. 
nor-leucne, para-nitrophenylalani„e.homophenylalanine. para-fluorophenylalanme, 3- 
am,no-2-be„.y.proprionic acid, homoarginine. and thelike. TTese mmatural ammo acds 
may be used, e.g.. in rational drug design in developing inhibitor, and/or bmdmg 
molecules to modulate a protein's activity. 

[00231 An amino acd is a molecule having the sh^e where a central carbon atom 
(the a-carbon atom) is linked to a hydrogen atom, a carboxyhc acid group (the carbon 
atom of which is referred to herein as a "carboxyl carbon atom"), an ammo ^oup (fte 
nitrogen atom of which is referred to herein as an "amino nitrogen atom"), and a s,de 
chain group that is linked to the a-carbon atom. For example, the side chain ^oup of 
alanine is a methyl group. Any atom that is not part of a side chain group is a mam cham 
atom e g , the a-carbon atom or the hydrogen that joins this carbon atom. 
[00241 Aposi.ivelychargedaminoacidisanynamrallyocc«ringorum.aturalammo 
acid having a side chain that is positively charged under normal physiological condmons. 
the positively charged, naturally occurring amino acids are arginine, lysme, and 
histidine. A negatively charged amino acid is any naturally occurring or unnatural arntno 
acid having a side chain that is negatively charged under nonnal physiologtcal condrt^ns. 
; Examples of negatively charged, naturally occurring amino acids are aspartic acd and 
gl^amic acid. A hydrophobic amino acid is any naturally occurring or umrati^al ammo 
acidthatcontainsahydrophobicsidechaingroup. Examples of natirrallyoccurnrrg 
hydrophobic amino acids are alanine, leucine, isoleucme, valine, proline, phenyla anme, 
t^tophan. and methionine. An uncharged, hydrophilic amino acid is any natirrally 
0 occurring or umratura. amino acid that is contains a hydrophilic side chain ^oup. b^.s 
uncharged a. physiological pH. Examples of naturally occurring uncharged, hydroph,l,c 
amino acids are serine, threonine, tyrosine, asparagine. glutamine. and cysteme. 
[00251 AS used herein, a polypeptide refers to a polymer of two or more ammo acds 
linked Via a peptide bond (i.e.. amino acid r^idues, and occurs when tire carboxyl carbon 
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of fte cart>oxyHc acid group bond«l to to a-carbo„ of one amino acid (or ammo 
acid residue) becomes eovalently bound to *e amino nittogen atom of .he amino gn>up 
bonded «. me a-carbon of an adjacent amino acid. A protein can include one or more 
polypeptide subunits (..g , DNA polymerase 111, RNA polymerase II) or other 
S components (e.g.. an RNA molecule, as oc^ in telomerase) will also be understood to 
be included within the meaning of ■•polypeptide" as used herein. Similarly, fragments of 
lull-length proteins are also "polypepHdes". 

10026) The ammo acid sequence of a given naturally occurring polypeptide (i.e. the 
polypeptide's -primary structure") can be determined by the nucleotide sequence of ttte 
,0 coding portion of a mRN A, which is in nun specif ed by genetic information, typtcally 
gnomic DNA (including organelle DNA, e.g.. mitochondrial or chloroplast DNA). 
(00271 The secondary structure of a polypepHde refers to local regular structure of a 
polypeptide segment, without considering the conformations of the side chain its residues. 
Common secondary structure elements include a-helix and P-strand. The tertiary 
,5 structure refers to the three-dimensional arrangement of all atoms in a polypeptide cham. 
[00281 An amino acid residue of a polypeptide interacts with adjacent residues (e.g. 
residues that are adjacent in primary, secondary or tertiary strucmre of a polypeptrde) as 
well as with ligands or substrates based, in part, on the type of side chain group present^ 
For example, hydrophobic amino acids are more likely to interact with other hydrophobrc 
20 amino acids or hydrophobic molecules. Similarly, hydrophilic amino acids are more 
likely to interact with other hydmphilic amino acids or hydrophilic molecules. T^ese 
types of interactions can be identified and characterized as discussed herein bas«I upon a 
residues chemical characteristics as well as its interaction with adjacent atoms or 

molecules. t_ i- 

25 [00291 AS used herein, a nucleic acid refers to DNA and RNA. which are both imear 

polymers of nucleotide subunits. Each nucleotide unit contains a base, a sug^ a«l . 
phosphate. In DNA. the sugar is deoxyribosc. and there are four types of bases: adenme 
(A), thymine (T). guanine (O). and cytosine (C). In RNA. the sugar is ribose and bases 
are made up of adenine (A), uracil (U). ^anine (G). and cytosme (C). In e,mer DNA and 
30 RNA. the base is linked to the sugar moiety throu^ a beta-glycosyl linkage, and the 

nucleotide »,its are joined together through phosphodiester bonds with phosphates at 03 

and 05' of the sugars. 
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[00301 The details of one or more embodiments are set forth in the accompanying 
drawings and the description below. Other features, objects, and advantages will be 
apparent from the description and drawings, and from the claims. 

DESCRIPTION OF DRAWINGS 
[00311 FIG 1 is a flow chart depicting a method Of generating a SIFt. 
[00321 FIG 2Aisanoverlayofl00di£ferentdockingposesofSB203580(shownin 
cyan stick models) in the vicinity of the target protein human p38 (PDB accession code: 
la9u). p38 is shown as ribbon model, and the shades represent different sub-regions of 
the 34 ligand binding site residues: R - Gly-rich loop. G - segment from -P3 to P4 
(including aC), B - pS and hinge region, M - catalytic loop, Y - Mg loop, O - activation 
segment. A color version of this figure can be found in Deng, Z.; Chuaqui, C; Smgh, J. 
"Structural Interaction Fingerprint (SIFt): A novel method for analyzing three- 
dimensional protein-ligand binding interaction," J. Med. Chem, 47: 337-344 (2004). 
[00331 FIG 2B is a hierarchical clustering of the SIFts of 100 SB203580 docking 
poses. A color version of this figure can be found in Deng, Z. et al., J. Med. Chem, 47: 
337-344 (2004). Each SIFts is represented as one line in the heat map in the middle of 
the figure, and only ON-bits (1) are shown as blocks. On the right side of the heat map 
shows the hierarchical clustering results on the fingerprints, including the dendrogram 
and the reorganized distance matrix. Colors (represented here as shades of gray) in the 
distance matrix correspond to the actual pair-wise distance between two SIFts, with dark 
red (e g cutting from top right to bottom left) being the most similar and dark blue (e.g.. 
in the northwest and southeast comers) being the least similar. SIFts in the heat map are 
rearrange according to the order given by hierarchical clustering. The seven major 
clusters (labeled 1 - 7) identified from the dendrogram are marked on the left side of the 
SIFt heat map. The three lines of blocks above the heat map indicate the locations of the 
corresponding binding site residues and the bits, hi the middle line (alternating shades of 
gray) each block represents a particular binding site residue, arranged in ascending 
residue numbers. Within each residue there are seven different binding bits, represented 
by seven smaller blocks in the third line. Also, the residues are grouped into six different 
) regions as described in FIG 2A, as indicated in the first line. 

[00341 FIG 2C-2I collectively are overlays of the poses within each of the seven 
clusters (labeled 1 - 7). in the same reference frame as FIG 2A. The crystal structure of 
SB203580 in the la9u structure is also shown in each figure as stick model. Color 
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visions of U.«e fii^res «m be found in Deng. Z. « al.. J. Med. Che™. 47 . 337-344 
(2004). Among tt.e binding rite residues, only *ose in contact with the respecttve 
clusters are shaded, using the same scheme as in FIG 2A. 

[00351 FIQ 3Ais a graph showing thePMF docking scores as a ftmctton of SIR 

cluster number. 

[00361 FIG 3B is a graph showing the Con^sus docking score as a functton of SIF. 

cluster number. . 
[00371 FIG 4A is a representation of ligand binding site residues of pt^tetnkmases. 

Shown are the murine PKA (ribbon model) and the ATP molecule (stick model) of the 
crystal strucmre latp. which was used as the reference structure for me kinase SIF. 
construction. Residues are ^ouped into five ditferen, reg,ons, shown in shades of gray 
The grouping and shading scheme are the same as in FIG 2A. A color verston of thrs 
figure can be found in Deng, Z. et al., J. Med. Chem. 47: 337-344 (2004). 
[00381 FIQ 4B is a hierarchical clustering of SIFts of 89 protein kinase crystal 
. structures. On the ri^t are a,e dendrogram and the correspondtng reorganized dtstance 
matrix map. Sffts are reorganized according to the order ^ven by the dendro^am. Stx 
different regions are labeled above the SIFt heat map. Three major clusters (1 - 3) are 
labeled onthe left sideoftheheat map. Acolor version ofthisfi^e can be foundm 

Dens Z. et al.. J. Med. Chem, 47; 337-344 (2004). 
0 [0039, F.G4Cisacomparisonof.hestrucmresofthethreedifrerentbindingmodes 

from FIO 4B. Three representatives are shown for each cluster. 
[0040, FIGS. SAand 5B are graphs showing the comparison of daUtbaseenrtchment 
using SIFt with ChemScore (FIG 5A) and PMF score (FIG 5B). Sixteen tatown p38 
iri^ibitors were diluted in 1,000 diverse compounds. For each compound. 30 dtffjen, 
,5 docking poses were retained and their respe^ve ChemScores and Tanimoto coefficents 
(compared with .he crystal structure ,a9u)we« calculated. The best Tanimo» 
coefficient amongthe 30 dockingposes of a compound is plotted against the best ^ 
ChemScore or PMF score of the same molecule. The dark dots in the figures repr^« 
*e ,6 known inhibitors, and the lifter dots represent the , ,000 random compounds. The 
30 doned lines indicate the corresponding cut-off scores used to filter the docktng poses m 
order to recover 14 out of 16 (87.5%) known inhibitor. Color versions of these figures 
can be found in Deng. Z. et al., J. M»). Chem, 47: 337-344 (2004). 
[0041, FIG 6 is a schematic example of an embodiment (i.e.. bit-stnng) of the 
method of FIG 1. 
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[00421 FIG 7A is a schematic diagram depicting the decomposition of a molecule 
into a core and variable groups. 

,00431 FIQ 7B is a hierarchical clustering of ttre SIFte of 100 docking poses. The 
SIFU are consttuCed to represent different R-groups and the core of the molecule. Each 
5 selec.edpositionofthetargetmoleculeismadeupoffourbinarybits,r=presen«ngco^. 

R, R2. R3, and R4, respectively. Each SIFts is shown as one line in the heat map m the 
left of the figure, and only ON-bits are shown. The sh»les (colors) of the heat map blo*s 
indicate different R-groups: red - core, blue - Rl, yellow - R2. g.een - R3. On the nght 
side of the 8gu^ shows the hierarchical clustering results on the fingerprints, meludmg 

,0 the dendrogram and the reorganized distance matrix. SIFts in the heat ™P 

reorganized according to the order given by the hierarchical clustering. The shaded 
(colored) bar on top of the SlFt heat map represents five con-esponding kinase structural 
sub-regions in the fingerprints. These sub-repons, each shaded (colored) differently 
include the Gly-rich loop (Q-.oop). the region spamring ftom p3 to P4 ((53 to P4), pS and 

15 the hinge region, catalytic loop and magnesium loop. , , . , 

,00441 FIOTCandTDshowthestructuresoftheposesinclusterl (7C)andcluster2 

(7D), respecavely, as identified by the hierarchical clustering of their R-SlFts (FIG 7B). 
in the context of the p38 crystal struemre (la9u). m poses are shown in gray, and the co- 
crystal structure of SB203580 is shaded according to atom types. The five kinase sub- 
20 regtons that are in contact with the poses within the group are shaded using the same 
shading scheme as described in FIG 2B and FIG 7B. 

,00451 FIG 8 is ahicarchical clustering of the SIFts of the 100 dockmg poses. Here 
ttre SIF, patterns contain 7 bits per selected position, each represenfing one of the seven 
chemical feanrres of the molecule: red -hydrogen bond acceptor (HBA), blue hydrogen 

25 bond donor (HBD), yellow -hydrophobic (HPH). green -polar (POL), ^-'^^''^ 
charged (NEO). orange -positively charged (POS). black -aromatic nng (AROM). The 

hierarchical clustering is based on the new SIFt patterns incorporating the chemtcal 
features of the molecules. 

100461 FIG 9Ais an interaction profile generated firom the SIFt patterns of four p38 
30 crystal structures - la9u. ,bl6. .bl7. and Ibmk. X-ax,s are the p38 residue numb^ 
of the interaction bits; the Y-axis represents the con^ation scores of the interact,on b.ts. 
,00471 FIG 9B shows the p38 inhibitor database emichmem performance usmg the 
SlFt-based approach. A library comprised of 16 known p38 inhibitors and 1000 random 
compounds were docked onto p38 target molecule and emiched using the SlF.-based 
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score raking n,«hod. The X-^xis is toe percentage of fte whole library collected, and the 
Y-axis is the percentage of active compounds harvested. For comparison, the ennchmen, 
performances by V.0 conventional scoring fimctions (ChemScore and PMF Score) are 
also shown. 

3 DETAILED DESCRIPTION 

[0048] AS used herein and in the appended claims, the singular forms "a," "and." and 
"the" include plural referents unless the context clearly dictates otherwise. Thus, for 
example, reference to "a protein" includes a plurality of proteins and reference to "the 
polypeptide" generally includes reference to one or more polypeptides and equivalents 

10 thereof known to those skilled in the art, and so forth. 

100491 Unless defined otherwise, all technical and scientific terms used herem have 
the same meaning as commonly understood to one of ordinary skill in the art. Although 
any methods, devices and materials similar or equivalent to those described herein may be 
used the typical methods, devices and materials are now described. 

15 100501 All publications mentioned herein are incorporated herein by reference m MX 
for the purpose of describing and disclosing the databases, proteins, and methodologies 
described in the publications that might be used in comiection with the presently 
described techniques. The publications discussed above and throughout the text are 
provided solely for their disclosure prior to the filing date of the present application. 

20 Nothingherein is to be construed as an admission that the inventors are not entitled to 
antedate such disclosure by virtue of prior invention. 

tOOSll Techniques are provided for a simple and robust method for representing and 
analyzing three-dimensional target molecule-ligand interactions. This method generates a 
structural interaction fingerprint (SIFt) - a representation of the interactions in the three- 
25 dimensional binaiy complexes, i.e., target molecule-ligand (e.g., protein-ligand or nucleic 
acid-ligand) complexes. The representation is in the form of an information string (e.g.. a 
binary bit string) containing a plurality of information blocks; each of which, m turn, 
contains a plurality of information units. Before one constructs a SIFt, one has to select 
the binary (target molecule-ligand) complexes. 
30 A. Construction and Analysis of SIFts 

1 Selection of Three-Dimensional Binary Complex Structures 
t00521 The SIFt-based method employs a set of three-dimensional binary structures 
(e.g.. the molecular docking results) to generate a set of SIFts. The set of structures can 

16 
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be obtained from different poses of a selected pair of target molecule (e.g., a protem such 
as a kinase) and ligand (e.g., a natural ligand or an inhibitor). See. e.g.. Example 1 
wherein the set of structures was obtained from 100 of different poses of a pyndmyl 
in^idazole inhibitor docking onto a single protein kinase p3 8 structure. In another aspect 
the set of structures can be obtained from structural data (e.g., docking results) of a 
number of different ligands interacting with a single target molecule. See, e.g., Example 
2 wherein the set of structures was obtained firom docking a group of different small 
rx^olecules (a library of 1 ,016 small molecules) onto the same target molecule (a protem 
kinase p38 structure). In a further aspect, the set of structures can be obtained from 
different target molecules and different ligands (see, e.g., Example 3 wherem both the 
target molecules (protein kinases) and ligands are different). Using different target 
molecules requires additional structural and sequence alignment steps, which will be 
further discussed below. Once a set of structures has been obtained, one can proceed to 

construct SIFts. 

II. Construction of a SIFt 

(i) IdenUfication of the Selected Positions of a Target Molecule 
[00S3) The next step involves selection of a set of positions ("selected positions") on 
the target molecule of each of the structitres where each of a«se selected positions .s 
com„,„nly involved in interactions (e.g.. non-covalent interaction) betive«. the target 
molecule and the ligand. These positions serve as reference points covenng all of the 
interactions in the target molecule-ligand complex, and an= then used as the common 
reference frame for constructing SIFts. 

10054] But how does one determine the location of the interactions-between the target 
molecule and the ligand? The selected positions are defined as regions of the target 
S molecule that are in contact with the ligand. Different methods have been developed to 
determine whether contacts have been made between the target molecule and the hgand 
in the context of a particular interaction. Below is a description of two exemplary 

methods. . . . , r * 

I005S1 For example, the program AREAIMOL of the CCP4 suites (wtach refers to 
0 -collaborative CompuUtional Project. Number 4." See the CCP4 suite: programs for 
protein c^allogn.phy. Acta Cryst. D50, 760-763, 1994; and Lee et al , J. Mol. B.ol. 
55-379-400 1971) can be used to identify the target molecule atoms that are involved ,n 
the non-cov'alent intermolecular interactions »ith ttte ligand. AREAIMOL evaluates U,e 
covalent accessible area by allowing a probe sphere of 1 .4 A rolling over tt.e Van der 
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.o.v«...o>=cu,« .anb* excluded for*e sake of s » a>U.ough ,n ' 

Laered soWen. mo,ecu,es can be included and «ea.«l in *e sa.e way as .arge. n.o>ecu.e 

a.o.s.Forp..ein«,ge.n,o.ecu.es.ifno..h^Sena.o.sshowU,a.^^^^^^^^ 

accessibility decreases upon ligand binding and *ese atoms axe also w,*^n 4.5 A of a^y 
X on- ydrogen a.on,s of .he ligand. .he ..dues corresponding .o *ese a.on,s are 

as seleL posUions (or Hgand binding a.„n,s,. The dcemrinabon of seleCed 
posi.ions in nucleic acid can be done in a similar manner. 
t„0«l AS .o hydrogen bonding inieraCionbe^eenthe ""'^^^ 
,Ld one can employ programs such as HBPLUS. See McDonald e. a.., L Mol. B.o. 
2^77-793. 1994 HBPLUS calculates and lis. all possible hydrogen bond donor »rd 

iZatomsandtheirrespectiveresiduesorbaseshavcbecnide^^^^^^^^^ 
bindingpositions axe computed and defined as the "selected posrfo o th^^^et 
molecule As menHoned above, different target molecules can be used. In such 

lees, additional s— and se,uence allien, steps arc re<,u,red to com^ert 

Lentbu.re.ated«^etmoleeulesin.„astandardres.duenumber,ngsysUm.*^^^ 

common ftameworic can be employed for constructing the SlFts (see, e.g. Example 3). 

(ii) DeterminationandCaleulationoflnteractton Types 
,00581 After identificaUon of the selected positions (i.e.. regions of fte targe 
Lecule where intermolecnlar interactions take place, one has to dete„ 
calculate the .yp.s of in.erac.ions presen. a. d,ese posi«ons. to one embodrmen, *e 
! molecurcanbeapolypeptideoraproteinandseveni.^^^ 

, .r K ... on .he AREAIMOL and HBPLUS results. The presence or absence ot 

' ZtTcIC clbTc— 

!X,)whlrorno,i.isincontactwith*eligand.«whether„rno.any^hde 

rkboneaL,s,nvolvedin*eeon.c..3,whe.herorno.anysie.ha,na^^ 

■ , M in a,e binding- 4) whelher or no. polar interaction .s .nvolved. 5) whefter not 
mvolvedmtheb,nd,ng, 4) ^y^^ provides hydrogen bond 

30 non-polar interaction IS mvolved. 6) whelher or no.. The answer to 

acceptoKsV and 7) whether or not it provides hydrogen-bond do„or(s). TTe answer to 
ri.; — es an information unit (in this embodimen.. a bi.) U,at corr»^^ 
:!paIicuLselec.edposition.Byioin,ng.he,nforma.ionun..s.oge.her^jm— ^ 

II is formed (in *is embodiment, a seven-bit-long bloclc). The ent,re SlFt can then 

1 s 
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eon^Ccd by se,ue„«aUy cc„oa.e„a««S *e in— n b.oC. of .ch of *e sel^^ 
posmons .og=*er, according .o ascendant posi«on nnn,be, (c.^. ^ 
,005,1 TT. SIFUresulting fton. a ^ of .«uc«res arc *erefore of fl,e same length, 
and each infonnaHon unit (e.g., bit) in the fingen-rint represents the strength or the 
5 presence/absenceofapar.ic„lari„,eractio.t,peatap»tic«.arse.ec.edpos.t,o. Asa 

result the SIFts are directly comparable. Once SIFts are generated ftom a set of 
strucn-res, one can perform analyses of the SlFts to obtain valuable interaCon pa,«n« 
^ information (e.g. the degree of binding conservation among the target molecule- 

,0 ';:::o7teinterac.ion.ypescanbec.assi«edi„an,™.herofways. For e«mple. the 
interaction types can be fragment constants descriptors (e.g.. hydrophobicity. hy*ogen 
bl accept" hydrogen bond donor,, electronic descHp.ors (e.g. charge. pa«.a, postfve 
surface area, partial negative surface area, dipolc movement, atomtc polarrzabthty). 
topological descriptors (e g.. Wiener index, Za^eb index. Hosoya index), molecular 

15 flexibility indices, spatial descriptors (e.g., shadow indices, molecular surface area. 

I:!;.^ncipalmomentofiner.ia,mo,.u,arvolume,,st™cn.raldescriptors(number„f 

chiral centers, molecular weigh,, number of rotatable bonds), or the^odynam.c 
descriptors (e.g. partition coefficient, desolvation f^ee energies for water and octanoi, 

20 l^«l Hydrophobicity is. measure of the thermodynamics of the partitiomng of a 
Lecule or part of a molecule b^een water and a non-a<,ueous phase (e.g., an organ,c 
solvent,, in particular, the ftee energy change (AG«»*, associated with transfemng a 
moleculeorpar.of.hemoleculefh.manon-a,ueousphase.owa.er. Inonepopular 
lefinition (CATALYST™. Accelrys btc. S«, Diego. CA 92121. USA), a conU^ous set 
25 of atoms are defined as hydrophobic if they are no. adjacent to any concentrattons of 
lge(ch.geda,omsore,ec..onegat,vea,oms,inaccnfbnnationsuchthattheatoms 

have surface accessibility, including phenyl, cycloalkyl. isopropyl, and methyl. 
III. Analysis of SIFts 

(i) Measurement of Similarity of SIFts 
30 (0062, AsdiscussedabovceachSlFtrepresentstheinteractionprofliebe^veena 

target molecule and a ligand. 1. follows that simiiar SIFts reflect similar interac«on 
patterns among the target molecule-ligand pairs. ..Ft^For 
100631 DifferentmethodscanbeemployedtomeasuresimtlantybetweenSIFts. 

example, one can use Tanimoto coefficient (Tc. see WiUet. Chem. Inf. Comput Sc. 
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38-983-996 1998). which reflects the quantitative measurement of the similarity. Using 
thebit-string embodiment described above, Tcbetweenbit-stringsAandBis defined as 

= where \An^ is the number of ON-bits common in both ^ andSand 

' jyl ^ 5| 

1^ u 5| is the number of ON-bits present in either A or B. 
5 (ii) Classification of SIFts Based on Similarity 

[0064] Based on the similarity measurements, one can classify similar SIFts 
displaying similar interaction patterns for further analysis, using methods such as 
hierarchical clustering. 

Fn,m catering results, sttuctures can be dusted into groups having simUar brndmg 

10 modes. 

10065] To analyze and eompare the tateracdon patterns within a group or between 
groups, an interaction profile can be generated by quantifying the degree of sinatlanty of 
each information unit at each selected position within the SIFts. One example ,s to 
calculate an mteraction conservation score for each infbtmation unit (e.g., bit) among 
each group. This score represents the percentage of SIFts that is ON (i.e., occurrence or 
presence of the interaction type) a. this particular selected position. Tlte higher the score, 
U,e more conserved this interaction type is within this group. Variations m the 
eonservation scores between ^vo groups reveal the differences of their interaction 
patterns. 

20 B. Amgh.LeyelVleKOftheSm-BasedMeM 

I0DS61 FIG. 1 shows a high-level view of an exemplary method for generating a SIFt. 
He method utilizes entries contained in structural databases containing data from vanous 
sources, e.g.. X-ray crystallography, NMR, protein modeling, and/or pro.ein/hgand 
interaction simulations (100). At block 200. three-dimensional data/sti^ctures of one or 
more complexes are retrieved from a database. Using any of a variety of computattonal 
methods well known to those in the art, a se, of selected positions (e.g, ammo acid 
residues or bases) that interact with a putittive ligand or binding molecule are selected at 

block 300. . 
[0067, once a three dimensional structure has been derived and selected posttions 
(e g binding site residues) identified, a plurality of intermolecular interaction types 
occu'mng at each selected position is determined and measured at block 400, using any 
computational methods well known in «,e ari. These interaction types can also mclude 
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oh«nioal and physical properties of ftepar. of a ligar.d interacting with each selected 
position, and s«,ue„ce conservation, structural conservation and flexibihty propc„>es of 

each selected position. 

10068] At block 500. a SIFt for each target molecule-ligand complex structure is 
generated. The SIFt includes a numeric (e.g., binary) code representation of each 
interaction type determined/measured for each of the selected positions of the target 

molecule. _ 
10069 ] At block 600, the SIFt containing information regarding charactenstic of the 
interaction types at each selected position is stored within a database for subsequent 
retrieval and analysis. Alternatively, the SIFt can be used to query a database (block 
650) generate an interaction profile comprising possible alternative ligands that fit the 
SIFt'(block 625), and/or define a structure based upon the type of SIFt obtained (block 
675). 

[00701 In one embodiment, a primary amino acid sequence of a polypeptide target 
molecule that is encoded by a selected genetic sequence is determined, and a three- 
dimensional structure is generated by homology modeling techniques. This aspect is 
generally represented in FIG. 1 as block(s) 100. As mentioned above, a three- 
dimensional model of a particular target molecule may be predicted computationally or 
determined in whole or in part based on experimental information. For example, x-ray 
crystallographic information may be used to identify a protein structure and provide 
information for constructing a three-dimensional model of the protein target molecule. 
[00711 In one embodiment, a ligand's three-dimensional structure is also obtained by 
similar techniques (e.g., modeling techniques and/or experimental crystallization 
techniques). For example, many protein molecules are co-crystallized with substrates 
i and/or ligands. The three-dimensional Ugand binding structure can then be modeled 
using programs that demonstrate interactions with a putative protein target molecule or 
binding domain thereof. Thus, one of skill in the art utilizing the 3D-protein structure 
and/or the 3D-hgand structure can obtain interaction data for the molecules being 
characterized. The ligand molecule may be any of a number of different types of 
0 compositions such as organic molecules, inorganic molecules, ions, proteins, protein 
fragments, nucleotides. RNA. DNA or other molecules representative of substrates, 
ligands. co-factors, and the like. In one embodiment, the ligand is obtained from a library 
of molecules. 
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[00721 Upon formation of the 3D complex structure, the interaction of the target 
molecule with a ligand is computed. Positions (e.g.. amino acid residues) that play a role 
in the interaction with the ligand are selected. This is generally represented by block 300 
ofFIG. 1. Particular atoms in the ligand can be identified as interacting with particular 
amino acid residues or bases of the target molecule. The criteria for determining an 
interaction (e.g., distance (e.g.. in angstroms) between various atoms) can be adjusted 
using techniques in the modeling programs as mentioned above or by techniques known 
to those skilled in the art. 

[0073] The target molecule-ligand interactions that are modeled result in the 
identification of certain selected positions (e.g., amino acid residues or bases) as well as 
the nature of interaction types between the ligand and the target molecule. The 
interaction types between a ligand and a particular selected position will depend upon the 
chemical-physical characteristics of the selected position in the target molecule as well as 
the nature of atoms or groups of atoms present in the ligand. For example, one of skill m 
5 the art will recognize that various equilibrium binding constants or binding energy values 
will be determinative in the type of interactions that will occur. This process is 
represented in FIG. 1 by block 400. 

[00741 The selected positions that play a role in interacting with the Ugand as well as 
the interaction types that occur with each selected position are then used to generate a 
0 SIFt (see, e.g., block 500 OfFIG. 1). This SIFt can be represented by a series of 

numerical values (e.g., binary numbers) corresponding to each selected position and each 
interaction type. The selected position and interaction type form a SIFt that can be used 
to compare or distinguish the target molecule (or a family of target molecules) from other 
target molecules. Using the SIFt as a tool for comparison, target molecules (e.g., proteins 
25 or polypeptides) may be structurally or functionally associated when they share 

commonalities in the SIFts. This latter process is represented in FIG. 1 by block 675. For 
example, by aligning the SIFts of two protein target molecules, a functional relationship 
can be determined based upon the degree of ahgmnent (e.g., homology) between the two 
information strings or SIFts. Various statistical measurements and limits can be placed 
30 upon the aligmnent to discriminate between random and related aligmnents. 

Accordingly, a powerful tool is provided to associate target molecules in a mam^er that 
does not rely on sequence or homology matching/comparisons alone, and to allow for the 
association of otherwise dissimilar target molecules that can be functionally related by 
their SIFts. 
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[00751 In certain embodiments, the SIFt fingerprint records the presence or absence 
of an interaction with a protein. The information unit containing this information can be 
simple to indicate whether a residue is involved in a particular interaction or not. In other 
embodiments, the SIFt can also include other chemical information about the hgand. In 
one example, a SIFt can include an information unit that contains information about a 
combinatorial library, which can include a core and variable group (in some examples, 
two three or more R groups). Specifically, a small molecule library can be converted 
into a core and variable groups, a SIFt pattern can be created for each library member, 
infom^ation units can be turned on or off at each of the selected positions based on the 
nature of the contact between the core and variable groups with the protein target. In 
another example, a SIFt can include an information unit that contains chemical feature 
information. For example, a series of chemical features can be mapped onto the hgand 
molecule. Each residue can be represented by an information block of a series of 
information units, each of which can be turned on or off depending on whether this 
residue is interacting with a particular chemical feature on the ligand. Examples of 
suitable chemical features include hydrophobic, hydrogen bond donor, hydrogen bond 
acceptor, negatively charged, positively charged, etc. In another example, a computed or 
experimentally determined property can be included in a SIFt. Information blocks that 
includes these properties can be used to identify chemical groups that are associated wxth 
specific residues of the protein. 
C. Embodiments and Applications 

[0076] As discussed above, one embodiment involves the use of a seven-bit 
information block (e.g., contact, main-chain atom group, side-chain atom group, polar, 
non-polar, hydrogen bond donor, hydrogen bond receptor) to represent the interaction 
pattern of each selected position of the target molecules (e.g., binding site residue of a 
protein target molecule). In such an embodiment, the interaction pattern represents the 
binding modes formed from seven different interaction types. Although such 
implementation has been shown to be able to successfully organize, analyze and mme a 
large structural library in a meaningful way, a 7-bit-long binary string obviously does not 
, represent all the intermolecular interactions occurring at a particular selected position. 
The richness of information can be improved by incorporating more bits representing 
other interaction types. For example, one can focus on functional groups instead of the 
entire residue as the basic unit, or take solvent molecules into consideration, or substitute 
the BOOLEAN bits with ordinal or continuous data that reflect the strength and 
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energetics ofthe interaction types. Such enriched SIFt provides a "higher-resolution" 
picture of the target molecule-ligand binary complex. In situations where computational 
speed is a critical issue, "lower-resolution" SIFts using fewer information units may be 
used. Accordingly, the infonnation units for a particular selected position (i.e.. the size of 
the information block) may range from 1-50 units or more. Simpler SIFts can be 
constructed using shorter time at the expense of richness of information. One skilled m 
the art can design, select, and identify the number of infonnation units (and thus the size 
ofthe infonnation block) for a particular selected position based upon the details and 
speed desired. For example, shorter infonnation strings (containing, e.g., 2-3 infonnation 
units per infonnation block) may be useful during the initial screening of a huge virtual 
Ubrary On the other hand, longer infonnation strings (and hence longer SIFts) provide 
more infonnation at the expense of quick perfonnance and are more useful for detailed 
stnictural analysis such as comparing groups of closely related structures. Choosing the 
right size of SIFt is a matter of finding a proper balance between these two competing 
considerations, with that balance dictated by the needs of a given situation. Another 
variable is the relative weight given to each interaction type. In one embodiment, 
infonnation units reflecting each interaction type can contribute equally to the total 
similarity score. It is also possible to tailor them in a different way by focusing on one or 
xnore particular interaction types, while down-playing other kinds of interacUons. 
[ 0077 ] One advantageous feature ofthe SIFt-based method is that it is genenc. 
Although it works well for the protein target molecule and small molecule ligand system, 
the method can also work for other systems as well, including protein-protein, nucleic 
acid-ligand, nucleic acid-protein/polypeptide systems, and the like. Indeed, the methods 
and systems are appUcable to amino acid sequences, as well as nucleotide sequences. For 
example, the methods can be applied to a nucleotide sequence or an amino acid sequence 
which conesponds to the nucleotide sequence in question. If the coding sequence is not 
known translation from the nucleotide sequence to the amino acid sequence may be 
perfonned in all frames ofthe nucleotide sequence. Programs that can translate a 
nucleotide sequence are known in the art. 
, C00781 in one embodiment, themethod can start by identifying a primary amino acid 
sequence of a protein. A number of source databases are available, as described below, 
that contain nucleotide sequences and/or deduced amino acid sequences for use with this 
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[00791 The primary direct experimental methods for determining the structure of 
proteins involved in particular interactions are X-ray crystallography, relying on the 
interaction of electron clouds with X-rays; and liquid nuclear magnetic resonance (NMR). 
relying on correlations between polarized nuclear spins interacting via indirect dipole- 
dipole interactions. X-ray methods provide information on the location of every heavy 
atom in a crystal of interest, accurate to 0.5-2.0 A (1 A =10-8 cm). 
[00801 A number of databases are available that contain 3D protein structures and/or 
structures showing 3D protein-ligand interactions. For example, protein-protein 
interaction databases include the Biomolecular hiteraction Network Database (BIND), 
which is a database designed to store full descriptions of interactions, molecular 
complexes and pathways; Database of Interacting Proteins (DIP), which catalogs 
experimentally determined interactions between proteins; an Object Oriented Database 
for Protein-Protein Interactions (INTERACT); and Pronet Online, which provides 
protein-protein interaction data and is maintained by Myriad Genetics. Other structural 
databases include Cambridge Crystallographic Data Centre; CATH - Protein Structure 
Classification; SCOP (Structural Classification of Proteins), based upon 3D fold 
classifications; PARTS LIST, which dynamically performs comparative fold surveys and 
is built on top of SCOP'S fold classification and acts as an accompanying annotation; 
PDB (Protein Data Bank), which is an intemational repository for the processing and 
distribution of 3D macromolecular structure data primarily determined experimentally by 
X-ray crystallography and NMR; PRESAGE, a database for structural genomics; 
Structural Biology Software Database, a software database maintained by University of 
Illinois; BiMSSECOST, a conformational database for amino acid residues in protems; 
BioMagResBank, a repository for data on proteins, peptides, and nucleic acids fi-om 
NMR spectroscopy; SWISS-3DIMAGE 3D, which contains images of proteins and other, 
biological macromolecules; SWISS-MODEL, a repository of structures generated by 
protein modeling; and the Cambridge Structural Database (CSD) of the Cambridge 
Crystallographic Data Center (CCDC). Other sources of primary amino acid sequence, 
modeled 3D structures and other crystallographical data will be apparent to those of skill 
in the art. 

[00811 The various techniques, methods, and aspects described above can be 
implemented in part or in whole using computer-based systems and methods. 
Additionally, computer-based systems and methods can be used to augment or enhance 
the fiinctionality described above, increase the speed at which the finictions can be 
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performed, and provide additional features and aspects as a part of or in addition to those 
described elsewhere in this document. Various computer-based systems, methods and 
implementations in accordance with the above-described technology are presented below. 
[00821 In one implementation, a general-purpose computer may have an internal or 
external memory for storing data and programs such as an operating system (e.g., DOS, 
Windows 2000TM, windows XP™, Windows NT™, OS/2. UNDC or Linux) and one or 
more application programs. Examples of application programs include computer 
programs implementing the techniques described herein, authoring applications (e.g.. 
word processing programs, database programs, spreadsheet programs, or graphics 
programs) capable of generating documents or other elecfa-onic content; client 
applications (e.g., an Internet Service Provider (ISP) client, an e-mail client, or an instant 
messaging (IM) client) capable of communicating with other computer users, accessing 
various computer resources, and viewing, creating, or otherwise manipulating electronic 
content; and browser applications (e.g., Microsoft's Internet Explorer) capable of 
rendering standard Internet content and other content formatted according to standard 
protocols such as the Hypertext Transfer Protocol (HTTP). 

[0083] One or more of the application programs may be installed on the internal or 
external storage of the general-purpose computer. Alternatively, in another 
implementation, application programs may be externally stored in and/or performed by 
20 one or more device(s) external to the general-purpose computer. 

[0084] The general-purpose computer includes a central processing unit (CPU) for 
executing instructions in response to commands, and a communication device for sending 
and receiving data. One example of the communication device is a modem. Other 
examples include a transceiver, a communication card, a satellite dish, an antemia. a 
25 network adapter, or some other mechanism capable of ti-ansmitting and receiving data 
over a communications link through a wired or wireless data pathway. 
[0085] The general-purpose computer may include an input/output interface that 
enables wired or wireless connection to various peripheral devices. Examples of 
peripheral devices include, but are not limited to, a mouse, a mobile phone, a personal 
30 digital assistant (PDA), a keyboard, a display monitor with or without a touch screen 

input, and an audiovisual input device. In another implementation, the peripheral devices 
may themselves include the functionality of the general-purpose computer. For example, 
the mobile phone or the PDA may include computing and networking capabilities and 
function as a general purpose computer by accessing the delivery network and 
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communicating with other computer systems. Examples of a delivery network include 
the tatemet the World Wide Web. WANs, LANs, analog or digital wired and wireless 
telephone networks (e.g.. Public Switched Telephone Network (PSTN), Integrated 
Services Digital Network (ISDN), and Digital Subscriber Line (xDSL)). radio, television. 
5 cable, or satellite systems, and other delivery mechanisms for carrying data. A 

communications link may include communication pathways that enable commumcations 
through one or more delivery networks. 

[0086] In one implementation, a processor-based system (e.g., a general-purpose 
computer) can include a main memory, preferably random access memory (RAM), and 

10 can also include a secondary memory. The secondary memory can include, for example, 
a hard disk drive and/or a removable storage drive, representing a floppy disk drive, a 
magnetic tape drive, an optical disk drive, etc. The removable storage drive reads from 
and/or writes to a removable storage medium. A removable storage medium can include 
a floppy disk, magnetic tape, optical disk, etc., which can be removed from the storage 

15 drive used to perform read and write operations. As will be appreciated, the removable 
storage medium can include computer software and/or data. 

[0087] In alternative embodiments, the secondary memory may include other similar 
means for allowing computer programs or other instructions to be loaded into a computer 
system. Such means can include, for example, a removable storage unit and an interface. 
20 Examples of such can include a program cartridge and cartridge interface (such as the 

found in video game devices), a removable memory chip (such as an EPROM or PROM) 
and associated socket, and other removable storage units and interfaces, which allow 
software and data to be transferred from the removable storage unit to the computer 
system. 

25 [0088] In one embodiment, the computer system can also include a communications 
interface that allows software and data to be transferred between computer system and 
external devices. Examples of communications interfaces can include a modem, a 
network interface (such as, for example, an Ethernet card), a communications port, and a 
PCMCIA slot and card. Software and data transferred via a commumcations interface are 
30 in the form of signals, which can be electronic, electromagnetic, optical or other signals 
capable of being received by a communications interface. These signals are provided to 
communications interface via a chamiel capable of carrying signals and can be 
implemented using a wireless medium, wire or cable, fiber optics or other 
communications medium. Some examples of a chamiel can include a phone line, a 
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cellular phone link, an RF link, a network interface, and other suitable communications 
channels. 

[ 0089 ] In this document, the terms "computer program medium" and "computer 
usable medium" are generally used to refer to media such as a removable storage device, 
5 a disk capable of installation in a disk drive, and signals on a channel. These computer 
program products provide software or program instructions to a computer system. 
[00901 Computer programs (also called computer control logic) are stored in the main 
memory and/or secondary memory. Computer programs can also be received via a 
communications interface. Such computer programs, when executed, enable the 
10 computer system to perform the features as discussed herein. In particular, the computer 
programs, when executed, enable the processor to perform the described techniques. 
Accordingly, such computer programs represent controllers of the computer system. 
[0091] In an embodiment where the elements are implemented using software, the 
software may be stored in, or transmitted via, a computer program product and loaded 
1 5 into a computer system using, for example, a removable storage drive, hard drive or 

communications interface. The control logic (software), when executed by the processor, 
causes the processor to perform the fimctions of the techniques described herein. 
[0092] to another embodiment, the elements are implemented primarily in hardware 
using, for example, hardware components such as PAL (Programmable Array Logic) 
20 devices, application specific integrated circuits (ASICs), or other suitable hardware 

components, bnplementation of a hardware state machine so as to perform the fimctions 
described herein will be apparent to a person skilled in the relevant art(s). In yet another 
embodiment, elements are implanted using a combination of both hardware and software. 
[0093] In another embodiment, the computer-based methods can be accessed or 
25 implemented over the Worid Wide Web by providing access via a Web Page to the 
methods described herein. Accordingly, the Web Page is identified by a Universal 
Resource Locator (URL). The URL denotes both the server and the particular file or page 
on the server. In this embodiment, it is envisioned that a client computer system interacts 
with a browser to select a particular URL, which in turn causes the browser to send a 
30 request for that URL or page to the server identified in the URL. Typically the server 

responds to the request by retrieving the requested page and transmitting the data for that 
page back to the requesting client computer system (the dient/server interaction is 
typically performed in accordance with the hypertext transport protocol (HTTP)). The 
selected page is then displayed to the user on the client's display screen. The dient may 
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then cause the server contairxing a computer program to launch an application to, for 
example, perform an analysis according to the described techniques. In another 
implementation, the server may download an application to be run on the chent to 
perform an analysis according to the described techniques. 

[0094] The described techniques open up the possibility of using an informatics 
approach in three-dimensional structure analysis and structure-based drug discovery. One 
application is in the area of virtual chemical library screening process. As discussed 
herein, SIFt can serve as a post-docking molecular organizer and filter. Docking poses 
can be organized based on their overall interaction patterns or bindmg modes. 
Furthermore, any previously acquired knowledge can be applied as structural constraints 
to filter out unwanted poses, giving a smaller and better pool of lead compounds. 
Compared to pharmacophore-based filters, the SlFt-based method is far more genenc, 
flexible and easy to apply. In combination with other pre-existing approaches such as 
empirical docking scores, the SIFt-based method can weed out more false-posxtwe 
compounds with undesirable properties, leaving a smaller but better pool of lead 
compounds, and thus significantly improve the hit rate. 

[0095] In addition, the SIFt-based approach can be applied in designing, refimng and 
pruning target-focused chemical libraries. As shown in example 4, different embodiments 
of SIFt (e g R-SIFt) can be very effective tools for discriminating compounds with 
different binding modes. With R-SIFt, one can easily distinguish compounds that bmd to 
the target molecule with desirable binding mode(s) ("good molecules") and others that do 
not ("bad molecules"). Based on this compound classification result, we can then generate 
prediction models (e.g., decision tree, neural network, support-vector machine) to predict 
the "good" and the "bad" compounds using their chemical properties as predictors. Such 
prediction models can be applied in the early stage of virtual library screening to filter out 
undesirable compounds in order to generate a smaller, target-specific pool of compounds. 
[0096] Besides processing the virtual structures generated during chemical library 
screening, tiie SIFt-based method can be used to analyze experimentally determined 
strucmres. Furthermore, the methods are not limited to structures involving one 
) particular target molecule; the method is generic enough to work for structures of a family 
of target molecules (e.g., the kinase family). The prerequisite is that these target 
molecules are structurally related, so that a common framework of the ligand-bmdmg site 
can be constructed. By using this method, distinct sub-groups of target molecule-hgand 
(e g enzyme-inhibitor) complex structures, each of which represents a distinct overall 
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interaction pattern, can be identified. The identified sub-groups of these target molecule- 
ligand complexes can also be classified according to other grouping criteria, such as 
grouping by different target molecule, by different types of ligands, or by different 
conformations. Quantitative comparisons of these clusters would reveal interaction 
patterns specific for a particular group and thus could provide structural insight into the 
mechanism of binding activity and selectivity. In addition, the SIFt-based interaction 
profile can capture the common features among a group of ligand-target molecule 
structures. It can be used to compare different groups of structures, and to correlate the 
differences or commonality in their SIFt profiles to their activities. 
[0097] In sum, the methods of characterization and generation of information strings 
representing SIFts provided by the described techniques are an improvement over 
conventional characterization methodologies that typically rely on sequence-based 
comparisons. The SIFt facilitates and integrates several desirable functionalities including 
structural data visualization, organization, analysis, and mining together, making it an 
powerful tool for analyzing and profiling three-dimensional binding interactions. As 
mentioned above, a particular usefiil feature of this method is that it compares and reveals 
associations (e.g., binding similarities) between dissimilar target molecules (e.g., proteins 
that may have fiinctional or behavioral analogies but are not obvious due to differences in 
the protein sequence). 

[0098] The described techniques (including SIFt-based methods, computer 
implementations, systems, and databases) disclosed herein translate three-dimensional 
intermolecular interactions into simple, linear information sfaings, thereby making it 
possible to efficientiy analyze large libraries of stiuctures using mathematics and 
informatics methods described herein. Although conceptually simple, the described 
techniques provide a novel method of visualizing, organizing, analyzing, and mining 3D 
structiiral information. The SIFt method organizes target molecule-ligand complex 
structiires into groups based on their interaction patterns. Intermolecular interactions 
between target molecules and ligands are visualized and can be easily comprehended 
using the heat-map of the SIFts for data visualization. Specifically, each line representing 
one fingerprint (or SIFt), and each bit in the SIFt colored or shaded according to its value. 
Using the described techniques, conserved/unconserved interactions within or among 
different sub-groups of structiires (data analysis) can be compared and quantified, hi 
addition, by representing the target molecule-ligand complex stiructures using SIFts, a 
query can be performed based upon structiiral interactions to select complexes (or 
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ligands) that satisfy predefined criteria (e.g.. a certain interaction pattern or binding mode, 
or even a particular interaction type occurring at a selected position), in a way similar to 
querying a database (data mining). 

^ RXAMPLES 

[00991 The following examples are provided to illustrate the practice of the described 
techniques, and in no way limit the scope of the claims. 

[00100] Color versions of FIGS. 2A-5B can be found in Deng. Z.; Chuaqui, C; Singh. 
J. "Structural Interaction Fingerprint (SIFt): A novel method for analyzing three- 
10 dimensional protein-ligand binding interaction," J. Med. Chem, 47: 337-344 (2004). 
Kxamoles 1-3 

[001011 In Example 1 , a set of molecular docking results was generated employing the 
crystal structure of p38 in complex with a pyridinyl imidazole inhibitor SB203580 (PDB 
accession code: la9u). See, e.g., Wang et al. Stmcture, 1998, 6(9). 1117-1 128. The 
15 docking program FlexX (see Rareyetal./. Mo/. Biol. 1996, 267, 470-489) in Sybyl 
(version 6.8. Tripos, Inc., St. Louis. MO) was used to dock SB203580 onto the crystal 
structure of p38. In this single ligand study, 100 poses of SB203580 generated by FlexX 
were retained for subsequent analyses. The ligand binding site was defined using a cutoff 
radius of 12 A from the SB203580 ligand (i.e., the conformation in the crystal structure) 
20 combined with a core sub-pocket cutoff distance of 4 A. The FlexX scoring function was 
used for scoring the docking. For each ligand being studied, ChemScore, Gscore. PMF 
Score, Dscore, and Consensus Score were evaluated using the Cscore utility in Sybyl. 
For references of the just-mentioned applications, see, e.g., Eldridge et al. J. Comput.- 
Aided Mol. Des. 1997, 11, 425-445; Jones et al. J. Mol. Biol. 1997. 267, 727-748; 
25 Muegge et al. J. Med. Chem., 1999, 42(5). 791-804; Gohlke et al. J. Mol. Biol. 2000, 
295, 337-356; and Charifson et al. J. Med. Chem.. 1999, 42(25), 5100-5109. FIG. 2A 
shows the 100 poses generated in this experiment, which adopted different orientations 
and positions in the ATP binding site of the kinase. 

[001021 In Example 2, the experiment described was designed to evaluate the database 
30 enrichment potential of SIFt by docking a diverse set of compounds spiked with known 
actives onto the same target protein structure. To this end, 16 kr.own p38 inhibitors were 
combined with 1,000 small molecules with diverse chemical structures compiled 
internally. These inhibitors were pyridinylimidazoles and analogs, covering the majority 
of the p38 inhibitor families reported thus far. as previously discussed by Adams and Lee 
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(see Adams and Lee. Current Opinion Drug Discovery & Development. 1999, 2, 96-109). 
These 1.016 compounds were docked onto the p38 structure (la9u) using FlexX 
distributed across 50 dual processor nodes of a Linux computing farm. For each ligand. 
30 different poses generated from the docking experiment were retained, generating a 
library of 30,480 (30 x 1,016) docked Ugand structures for subsequent interaction 
fingerprints analysis. The performance of database enrichment was measured by the 
enrichment factor (EF), calculated based on the ability of recovering 14 out of 16 (87.5-/o) 
known inhibitors. For reference, see, e.g., Pearlman et al. J. Med. Chem. 2001, 44. 502- 
51 1. In both docking experiments, three-dimensional conformers of the ligands were 
generated using OMEGA (OpenEye Scientific Software, Inc., Santa Fe, NM). 
[001031 In Example 3, the SIFt-based method was also used to analyze a family of 
experimentally determined structures. Specifically, a panel of 89 X-ray crystal structures 
of protein kinase-ligand complexes was selected firom the PDB. The selection critena 
included- 1) the structures must contain Ugands (either ATP, GTP or other inhibitors) 
present in their ATP-binding pockets; 2) most of the ATP binding site residues are visible 
and present in the crystal structures. These 89 protein kinase-inhibitor complexes include 
25 different kinases, covering 14 different protein kinase subfamilies as classified by 
Hanks and Quimi. See Hanks and Hunter FASEB J. 1995. 9. 576-596 and Hanks and 
Quimi Methods Enzymol. 1991. 200, 38-62. In all, the kinase structures contain 54 
unique compounds representing a variety of chemical structures (see Table 1). 
[00104] 
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[00105] In each of Examples 1-3, the first step in the construction of SIFts is to 
identify a list of selected positions or binding site residues that are common in all 
complex structures being studied. The resulting panel of ligand binding site residues, 
which covered all of the interactions occurring between the target protein and the ligands, 
was then used as the common reference frame to construct the interactions fingerprints. 
[00106] For a group of structures involving the same target protein (experiments such 
as those described in Examples 1 and 2), the ligand binding site is defined as the list of 
residues comprising the union of all residues involved in ligand binding over the entire 
library of structures. For a group of structures involving different target molecules (such 
as the experiment described in Example 3), additional structural and sequence pre- 
alignment steps were required as described inunediately below. 

[00107 ] In Example 3, the crystal structure of murine PKA complexed with ATP and a 
peptidic inhibitor PKI (PDB accession number: 1 ATP; see Zheng et al. Acta Cryst. 1993, 
1)49, 362-365) was used as the reference model for structural and sequence alignment. 
Initial amino acid sequence alignment of the catalytic cores of these kinases was taken 
from the Protein Kinase Resources (see Smith et al. TIBS, 1997, 22(1 1). 444-446). 
Structural alignment of the kinase structures was carried out manually and focused 
primarily on the vicinity of the ATP binding sites. Based on the structural alignment 
results, sequence alignments were carefiilly checked and adjusted if necessary, so that all 
structurally equivalent residues match each other in the sequence alignment. After the 
sequence and structural alignments, the residues of the non-murine PKA protein kinases 
were renumbered and tallied to the murine PKA residue numbering system, resulting in a 
uniform residue numbering system for all kinases analyzed. Identification of the list of 
ligand binding sites was carried out as previously described using the new PKA- 
equivalent residue numbers. 

[ 00108 ] In each of Examples 1-3, after all the ligand binding site residues were 
identified and all the protein-ligand intermolecular interactions were calculated, the next 
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Step was to classify these interactions, as described previously in the "Detailed 
Description" Section. Seven different types of interactions occurring at each binding 
residue were extracted and classified from the AREAIMOL and HBPLUS results. The 
inquiries were: 1) whether or not it is in contact with the ligand; 2) whether or not any 
5 main-chain atom is involved in the contact; 3) whether or not any side-chain atom is 
involved in the binding; 4) whether or not a polar interaction is involved; 5) whether or 
not a non-polar interaction is involved; 6) whether or not the residue provides hydrogen 
bond acceptor(s); 7) whether or not it provides hydrogen-bond donor(s). By doing so, 
each residue was represented by a seven-bit-long bit string. The whole interaction 
10 fingerprint of the complex was finally constructed by sequentially concatenating the 
binding bit string of each binding site residue together, according to ascendant residue 
number order. Therefore, interaction fingerprints are of the same length and each bit in 
the fingerprint represents presence or absence of a particular interaction at a particular 
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binding site. 

[00109] As described above in Example 1, the SIFt-based method was applied to 
analyze the result of a typical docking study. This comprised of 100 docking poses of a 
small molecule inhibitor (SB203580) of p38, for which the crystal structure was known 
(PDB entry 1 a9u). The poses adopted diverse binding modes, varied in their orientations 
and positions relative to the target protein and were complex to interpret visually (see 
FIG. 2A). A total of 34 protein residues in the vicinity of the ATP binding pocket were 
identified as the ligand binding site. These binding site residues were located in different 
sub-regions of the kinase structure. SIFts were generated for all complexes, each of 
which was composed of 238 (7 x 34) binary bits. The hierarchical clustering result of 
these fingerprints is shown in FIG. 2B with the fingerprint Tanimoto similarity matnx 
represented as a heat-map. The dendrogram revealed seven major clusters, labeled 1 to 7, 
respectively. FIG. 2B shows that the clustering by their SIFt patterns has separated the 
poses into different groups with distinct binding interactions. FIGS. 2C - 21 depict the 
structures of each major cluster, each of which was put in the same reference frame. 
Interestingly, each of these seven clusters was comprised of poses having similar bmdmg 
modes with the receptor. Cluster 1 contained molecules similar to the known X-ray 
crystal structure. Clusters 2-5 were similar in position but represented distinct bmdmg 
modes that resulted in dissimilar interactions with the Gly-rich loop and the catalytic loop 
ofp38. Finally, clusters 6 and 7 were outside the ATP binding site. Reassuringly, the 
degree of variation between clusters observed visually in their binding interactions 



14937.0009 



appears to correlate to their distance in the dendrogram. For example, groups 1 , 4, 6 and 
7 each showed very little structural variation, as represented by tight clusters in the 
dendrogram, whereas group 3 and 5 showed relatively more diversity in their structures as 
well as in their fingerprints. Furthermore, clusters! and 7 had very little in common and 
were farthest from each other in the dendrogram. In summary, visual inspection confirms 
that SIFt is useful in separating docking poses into distinct clusters that reveal distinct 
binding interactions. 

[00110 1 Traditionally, various scoring functions have been used to rank poses from 
docking studies. Scoring function scores provide an estimate of the binding strength of 
the compounds in order to identify the potential "good binders" from a large pool of 
poses, such that a selection of top scoring compounds derived from a rank ordered list of 
docked ligands will be enriched with active compounds. Scoring fiinctions can be usefiil 
in discriminating the poses in the different SIFt clusters (i.e., different binding modes). In 
FIG. 3 A, the first SIFt cluster, which is the closest to the true binding conformation, 
showed a wide range in PMF scores, spanning from the best score (-70) to the worst (-4). 
In fact, the majority of the poses in this cluster was no better in their PMF scores than 
those in other SIFt clusters. In addition, the PMF scores for SIFt cluster 2 were just as 
good as those for cluster 1, even though they adopt different, crystallographically 
unobserved, interactions with the receptor. Other different clusters also overlap with each 
other in their docking scores. Clearly, PMF score is a poor scoring fimction for 
discriminating compounds with true binding mode and irrelevant poses in the experiment. 
In an attempt to broaden the analysis of scoring fiinctions, consensus scoring fimction that 
consists of five commonly used scoring fiinctions was also examined (see FIG. 3B). 
Many of the poses in clusters 1 - 3 had high Cscores (3 - 5). while clusters 3 - 7 
overlapped significantly in the score range 0 - 2. This example fiirther demonstrates the 
fact that across a range of scoring fiinctions, the energy-based approaches alone were 
insufficient in distinguishing different binding modes, and in isolating those poses 
corresponding to the observed binding mode. 

[00111] The application of the SIFt-based method was extended to other ensembles of 
structures involving different proteins and a diverse set of small molecules. In Example 
3, 89 known crystal structures of the protein kinase family that had been deposited in the 
Protein Databank were chosen. As mentioned above, they represent 14 different protein 
kinase subfamilies and 54 unique kinase small molecule Hgands/inhibitors. The structure 
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and sequence homology among protein kinases enabled us to analyze these structures 
using the SIFt-based approach. 

[001121 A total of 56 residues were identified as the ligand binding site (see FIG. 4A). 
The heat-map and the results from hierarchical clustering are shown in FIG. 4B. These 
interaction fingerprints were diverse, reflecting a high degree of variability in their 
binding interactions. Nevertheless, three major clusters can be identified from the 
dendrogram (see FIG. 4B). Although the results indicate that within each cluster there 
existed considerable variation in their interaction patterns, these three groups represented 
three distinct binding modes, as confirmed by carefixl inspections of their structures (see 
D FIG 4C). The first cluster has 4 members, containing structures of human p38 in 
complex with four different pyridinyl imidazole inhibitors: SB203580, SB216995. 
SB220025andSB218655. The second cluster had 1 6 members, mostly human CDK2 m 
complex with different compounds with diverse chemical properties. The third cluster, 
which does not have a clear-cut boundary, is comprised of approximately 36 structures. 
5 and almost all of them are structures of different kinases in complex with ATP or ATP- 
analogs inhibitors (GTP. AMPPNP. AMPPCP. AMP. ADP. etc.). Besides these three 
major clusters, about one-third of the 89 structures are either singletons or form tiny 
clusters. Interestingly, the three major clusters represent different grouping examples of 
protein-Hgand complexes - the first one is made up of the same protein and chemically 
20 similar compounds; the second group contains the same protein but with a variety of 
ligands; the third cluster contains different proteins in complex with chemically similar 
ligands. 

[00113] Comparison of these fingerprints also revealed interactions that are conserved 
or highly variable among the structures. For instance, contact interactions with residue 57 

25 (in PKA numbering, within the Gly-rich loop) and residue 70 (also in PKA numbering), 
are strictly conserved among all of the 89 protein kinase-ligand structures. Other highly 
conserved interactions include contacts with residue 49, 72, 120, 121, 123, 173. 184, etc. 
(see FIG 4B). In contrast, many other interactions are not conserved or only conserved 
within a particular group. Detailed and systematic comparison of these structural profiles 

30 of the ATP binding sites of protein kinases will be presented elsewhere (Deng et al. 
manuscript in preparation). 

[00114] The SIFt-based method provides a new and powerfiil tool for lead discovery 
and lead optimization, enabling the search for molecules in a chemical database on the 
basis of expected interaction patterns to a target molecule. This appHcation was 
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specifically tested in Example 2. where a virtual screen for a set of 16 known p38 
inhibitors spiked into a diverse library of 1 .000 commercially available compounds was 
performed. These p38 inhibitors were all ATP-competitive inhibitors, and despite 
representing varied chemical templates had similarities to the pyridinylimidazole series 
(i.e., SB203580-like) for which the crystal structure of the complex was known (la9u). 
[00115] These inhibitors and the random collection of chemical compounds were 
docked using FlexX onto the crystal structure of p38 (la9u), and how well these known 
inhibitors could be enriched using commonly used scoring functions was assessed. These 
were then compared with the results from a SIFt-based emichment involving filtering of 
) the compounds based on their similarities in interaction patterns (measured by Tanimoto 
coefficient) to SB203580, a known pyridinylimidazole inhibitor of p38 for which the X- 
ray crystal structure was known. The rationale for SIFt-based enrichment is that these 16 
. known inhibitors, being analogs of the pyridinylimidazole series, are expected to bind to 
p38 with similar overall binding modes. 
5 [00116] FIG. 5A, 5B and Table 1 show the comparison of the database enrichment 
performances of the scoring fimctions with SIFt. ChemScore gave a modest enrichment 
factor of 5.4, and 166 compounds were harvested in order to identify 14 of the 16 known 
p38 inhibitors. PMF was slightly worse than ChemScore, with an enrichment factor of 
2 0 In addition, an analysis of the binding modes of the poses of the enriched p38 
:0 inhibitors identified using these scoring fractions showed that some of them were highly 
variable to the known crystal structure of SB203580, despite similarities in 
functionalities, suggesting that their binding modes obtained by ChemScore or PMF score 
were incorrect. This impUes that the scoring fimctions were probably performing worse 
than the enrichment factors were indicating. In comparison, SIFt scored quite well, 
25 having to harvest only 24 compounds to be able to identify 14 of the 16 inhibitors, giving 
an enforcement factor of 37.0. Reassuringly, the highest scoring compound recovered by 
SIFt was SB203580 upon which the interaction fingerprint used to probe the database was 
based Visual inspection of the binding modes of the p38 inhibitors identified using SIFt 
showed that all of their binding modes were similar to that of SB203580. A combination 
30 of SIFt and ChemScore led to a modest increase in enrichment (EF = 42.3). 
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Table 2. Comparison of the database enrichment perfomiances 
of SIFt with ChemScore and PMF Score 



Filtering Method 


Enrichment Factor (EF)* 


PMF Score 


2.0 


ChemScore 


5.4 


SIFt 


37.0 


SIFt + ChemScore 


42.3 



♦ EF is defined as: EF={HitSsa„.pied/N^„p,ed}/{Hits„«/N.ou,i}. where HitSsampicd is the number of known 
inhibitors recovered the sampled fraction of poses; Hits,o<ai is the number of known inhibitors 

present in the whole library of N,ou.i compounds'*. Here each EF was calculated based on the ability of 
recovering 14 out of 16 known p38 inhibitors spiked into a random library of 1 ,000 compounds. 

Example 4 and 5 

[00117] These two examples illustrate two other embodiments of SIFt implementation 
that include the chemical information about the ligands into their SIFt patterns. In 
Example 4, the information about core and variable groups (R-groups) of a compound is 
embedded into the SIFts (e.g., R-SIFts); in Example 5, the pharmacophoric features of 
the compound are used. 

[ 00118 ] In Example 4, the same set of 100 docking poses of SB203580 docked onto 
p38 used in Example 1 and 2 was also used. The SB203580 molecule was decomposed 
into core, Rl, R2 and R3 groups as shown in FIG. 7A. Each non-hydrogen atoms were 
assigned to one of these four different groups. Four binary bits were used for each 
binding site residue, representing the core, R-1 , R-2, R-3, respectively. If this residue was 
in contact with (i.e., distance <= 4.0 Angstrom) a non-hydrogen atom belonging to a 
particular group, then the corresponding bit is turned ON (1); otherwise the bit remains 
OFF (0). The final SIFt pattern was constructed by concatenating all the bit strings of all 
the binding site residues together, according to the same ascendant residue number order, 
as used in Example 1 . 

[00119 ] Grouping of the SIFt patterns was carried out using the same hierarchical 
clustering method as described in Example 1. 

[00120] FIG. 7A is the decomposition of molecule SB203580 into core (1) and three 
different R-groups, R-1 (2), R-2 (3) and R-3 (4). 

[00121] FIG. 7B is a hierarchical clustering of the SIFts of 100 SB203580 docking 

poses. The SIFts were constructed to represent different R-groups and the core of the 

molecule. Each selected position of the target molecule is made up of four binary bits, 

representing core, Rl , R2, R3, and R4, respectively. Each SIFt was shown as one line in 
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the heat map in the left of the figure, and only ON-bits are shown. The shades of gray, or 
colors, of the heat map blocks indicated different R-groups: red - core, blue - R-1, yellow 
- R-2, green - R-3. On the right side of the figure showed the hierarchical clustering 
results on the fingerprints, including the dendrogram and the reorganized distance matrix. 
5 SIFts in the heat map were reorganized according to the order given by the hierarchical 
clustering. The shaded, or colored, bar on top of the SIFt heat map represents five kinase 
sub-regions in the fingerprints. These sub-regions, each shaded or colored differently, 
include the Gly-rich loop (G-loop), the region spanning firom p3 to p4 (P3 to p4), p5 and 
the hinge region, catalytic loop and magnesium loop. 

10 [ 00122 ] FIG 7C and 7D show the structures of the poses in cluster 1 (7C) and cluster 2 
(7D), respectively, as identified by the hierarchical clustering of their R-SIFts (FIG 7B), 
in the context of the p38 crystal structure (la9u). The poses are shown in gray or cyan, 
and the co-crystal stmcture of SB203580 is shaded or colored according to atom types. 
The five kinase sub-regions that are in contact with the poses within the group are shaded 

1 5 or colored using the same shading or coloring scheme as described in FIG 2B and FIG 

7B. Compared to Example 1, the 7 R-SIFt groups are more tightly clustered, indicating R- 
SIFt is more sensitive to the different binding mode than the original SIFt comprised of 7 
interaction bits that were used in Example 1 . In addition, since different bits in the R-SIFt 
correspond to different segments of the molecule, it is very straightforward to tell fi-om 

20 the R-SIFt which part of the molecule interacts with which part of the target molecule. 
Therefore, R-SIFt can be used in virtual screening as a convenient tool to separate poses 
of different binding modes, 

[ 00123 ] In Example 5, the same set of SB203580 docking poses were used. This time, 
however, each atom of the molecule was assigned to seven different chemical features, 

25 including hydrogen bond acceptor, hydrogen bond donor, hydrophobic, polar, negatively 
charged, positively charged, or aromatic ring atom. Some atoms fell into more than one 
category of these chemical features. When constructing the new SIFt patterns, seven 
binary bits were used to represent a binding site residue, each indicating one of the above 
seven chemical features. If this residue was within 4.0 Angstroms from any atom that 

30 belongs to a particular chemical feature category, then this bit was turned ON (1); 

otherwise it remained OFF (0). The final SIFt was constructed by concatenating all the 
binary strings for all binding site residue together, in the same order as used in Examples 
1 and 4. 
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[00124] FIG- 8 is the hierarchical clustering of the SIFts of the same 100 docking 
poses of SB203580. Here the SIFt patterns contained 7 bits per selected position, each 
representing one of the seven chemical features of the molecule: red -hydrogen bond 
acceptor, blue -hydrogen bond donor, yellow -hydrophobic, green -polar, cyan - 
5 negatively charged, orange -positively charged, black -aromatic ring. These colors are 
represented in shades of gray in FIG. *. The hierarchical clustering was based on the new 
SIFt patterns incorporating the chemical features of the molecules. 
[ 0 0125 ] In both Examples 4 and 5, the two different constructions of SIFt pattern 
provided richer information about the chemical environment arovmd the binding site. 

10 Hierarchical clustering results of these two set of new SIFts both gave similar 

performance, in terms of separating different binding modes of the poses, and the results 
were comparable with that given by the previous construction of SIFt described in 
Example 1. This indicates that both the SIFt patterns incorporating the information about 
the R-group and chemical features were very useful ways of representing the structural 

1 5 information, complimentary to the previous construction of SIFt. 
Example 6 

[00126] This example demonstrates one of many potential applications of the 
interaction profile. A structural interaction profile represents the degree of similarity for 
an interaction occurring at a particular binding site among a group of structures. In this 
20 example, the value at each position is the average of all the interaction bit values 
occurring at this particular position within a group of SIFts. 

[ 00127 ] FIG 9 A shows the interaction profile generated firom the SIFt pattems of four 
p38 crystal structures - la9u, lbl6, lbl7, and Ibmk, each of which contains a different 
potent p38 inhibitor. The X-axis represents the p38 residue numbers of the interaction 
25 bits; the Y-axis represents the conservation scores of the interaction bits. The more 
conserved an interaction, the higher the value at this position, 

[00128] The above interaction profile was used to enrich p38 inhibitors from a large 
library. The idea behind the approach is that if a compound adopts an interaction pattern 
similar to that of previously known inhibitors (i.e., an interaction profile), then it is more 
30 likely to be a true inhibitor. The statistical Z score was used to measure how significant 
the similarity between a SIFt and a target profile is above a certain background. Z score is 
defined as 
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where x is the Tanimoto coefficient of the SIFt against the target profile, <Xb> and a are 
the mean and standard deviation of the Tanimoto coefficients of all the SIFts in the 
background set, respectively, against the same target profile. The background set was 
used to construct a reference distribution upon which the comparisons were based. 
[ 0 0 12 9 ] A library comprised of sixteen known p38 inhibitors and 1 000 random 
compounds were docked onto p38 target molecule. For each compound, 10 poses were 
retained for subsequent analysis. Poses were ranked according to their SIFt Z scores 
against the p38 interaction profile, generated fi-om four co-crystal structures. The 
background set used in Z score calculation included all of the docking poses. For each 
compound, the pose with the highest Tanimoto coefficient against the p38 profile was 
selected, and then all 1016 best poses were ranked according to their Z score. The 
database enrichment curves are shown in FIG 9B. The X-axis is the percentage of the 
whole library collected, and the Y-axis is the percentage of active compounds harvested. 
For comparison, the enrichment performances by two conventional scoring fimctions 
(ChemScore and PMF Score) are also shown. 

[00130] From Figure 9B it is clear that the enrichment obtained by applying SIFt-based 
Z score to select the best pose for each compound provided markedly superior results 
over those obtained using standard scoring using the ChemScore and PMF Score. 
[00131] A number of embodiments have been described. Nevertheless, it will be 
understood that various modifications may be made. Accordingly, other embodiments are 
within the scope of the following claims. 
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